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Abstract. It is decidable for deterministic MSO definable graph-to-string or graph- 
to-tree transducers whether they are equivalent on a context-free set of graphs. 



It is well known that the equivalence problem for nondeterministic (one-way) fi- 
nite state transducers is undecidable, even when they cannot read or write the empty 
string [Gri68]. In contrast, equivalence is decidable for deterministic finite state trans- 
ducers, even for two-way transducers [Gur82]. The question arises whether these results 
can be generalized from strings to transducers working on more complex structures like, 
e.g., trees or graphs. There is no accepted notion of finite state transducer working on 
graphs; instead, it is believed that transductions expressed in monadic second-order 
logic (MSO) are the natural counterpart of finite state transductions on graphs. The 
idea is to define an output graph by interpreting fixed MSO formulas on a given in- 
put graph. In fact, if the input and output graphs of such an MSO graph transducer are 
strings, then the resulting transductions (in the deterministic case) are precisely the de- 
terministic two-way finite state transductions [EH01]. Hence, by the above, equivalence 
is decidable for deterministic MSO string transducers. A nondeterministic MSO graph 
transducer can easily simulate a nondeterministic finite state transducer that cannot read 
the empty string; hence, equivalence is undecidable. Actually, even for deterministic 
MSO graph transducers equivalence is undecidable. This is due to the fact that MSO is 
undecidable for graphs (Propositions 5.21 and 5.2.2 of [Cou97]). The question remains 
whether deterministic MSO tree transducers have a decidable equivalence problem. Re- 
cently, these transducers have been characterized by certain attribute grammars [BEOO] 
and macro tree transducers [EM99]. However, for both models it is unknown whether 
equivalence is decidable. Here we give an affirmative answer: equivalence of determin- 
istic MSO tree transducers is decidable. This result has several applications; for in- 
stance, it implies that XML queries of linear size increase have decidable equivalence, 
by the results of [MSV03], [EM03a], [EM03b], and [Man03]. Our proof generalizes 
the one of [Gur82] (see also [Iba82]): it is based on the fact that certain sets are semi- 
linear. The reader is assumed to be familiar with MSO on graphs and with MSO graph 
transducers, see, e.g., [Cou97,Cou94]. 

Convention: All lemmas stated in this paper are effective. 

A graph alphabet is a pair (U, r) of alphabets of node and edge labels, respectively. 
A graph over (U, r) is a tuple (V, E, A) where V is the finite set of nodes, E C V x 
r x V is the set of edges, and A : V — > E is the node labeling function. The set of 
all graphs over (£, T) is denoted GR(S, r). The language MSO(Z\ T) of monadic 



second-order (MSO) formulas over (S, r) uses node variables x, y, . . . and node-set 
variables X,Y, . . .; both can be quantified with 3 and V. It has atomic formulas lab CT (x) 
for cr G S, denoting that x is labeled a, edg 7 (x, y) for 7 6 f, denoting that there is 
a 7-labeled edge from x to y, and x G X denoting that x is in X. For g G GR(I7, r) 
and a closed formula ip in MSO(I7, r) we write g \= ip if g satisfies tp; similarly, if 
ip has free variables x or x, y and u, v are nodes of g, then we write (5, u) \= ip or 
(g, it, v) |= V if 5 satisfies ^ with a; = tt or with x = u, y = v, respectively. 

Let (Si, A) ; (^2) A) be graph alphabets. A deterministic MSO graph transducer 
M (from (Zi, A) to (S2, A)) is a tuple (C, </?dom, ^ -X') where C is a finite set of copy 
names, (p^ om G MSO(Zi, A) is the closed domain formula, 9 = {tp c ,a(x)}ceCMes 2 
is a family of node formulas, i.e., MSO formulas tp Cja (x) over (Zi, A) with one free 
variable x, and X = {Xc,c',-y(%, y)}c,c'ec,ie r 2 is a family of ec/ge formulas, i.e., MSO 
formulas Xc.c'^Z; y) over Ji) with two free variables x, y. 

Given g G GR(2a, A), the graph h = r M (g) G GR{S 2 ,r 2 ) is defined if |= 
Vdom, and then Vh = {(c, u) \ c G C, u G V^, there is exactly one a <E S 2 such 
that( 5 ,u) \= ip c ,*( x )}> E h = {((c,u),7. (c',u')) I (c,u),(c',u') G ^,7 G A, 
and (ff, u, it') h Xc,c', 7 (a;,y)}, and A ?l = {((c,u),cr) | (c,u) G V^.ct G ^2, and 
(g, u) \= ip c ,a(x)}- Hence, t m is a partial function from GR(Z' 1 , A) to GR(S 2 , A) 
with dom(Y M ) = {.g G GR(A, A) I 5 h <PAom}- 

A (nondeterministic) MSO graph transducer is obtained from a deterministic one by 
allowing all formulas to use fixed free node-set variables Yi,Y 2 , . . ., called parameters. 
For each valuation of the parameters (by sets of nodes of the input graph) that satisfies 
the domain formula, the other formulas define the output graph as before. Hence each 
such valuation may lead to a different output graph for the given input graph. Thus, 
T M C GR(27i, A) x GR(I72, A). 

The following lemma contains a basic fact about MSO definable graph transduc- 
tions; see, e.g., Proposition 3.2 in [Cou94]. 

Lemma 1. The (deterministic) MSO graph transductions are closed under composi- 
tion. 

Notation. Let Mi ; M 2 denote a transducer M for which tm — tm 2 T Mi ; note that 
M is deterministic, if Mi and M 2 are. By Lemma 1, Mi; M 2 effectively exists. 

In the sequel we often identify a transducer M with its transduction tm, and simply 
write, e.g., M(g) in place of tm(s)- 

Let M be an MSO graph transducer and let X, Y be sets of graphs. Then M is 
called an MSO X-to-Y transducer, if dom(M) C X and range(M) C Y, and it is an 
MSO X transducer if additionally Y = X. 

A discrete graph (dgraph, for short) is a graph without edges. Let g be a dgraph over 
(S, 0) with S = {cti, . . . , <Tfe}. Define Par(g) as the vector (m, . . . , nk) in N fc such 
that, for 1 < i < k, rn is the number of cr, -labeled nodes in g. Similarly, for a string 
w G S*, Par(w) is the vector in N fe such that the i-th component is the number of <Tj's 
in w. We denote by dgr(w) the (unique) dgraph g such that Par(g) = Par(w). For a 
set S of dgraphs or strings, Par(S') is the set of all Par(#) for g G S. A set P C N fe 
is semilinear if there exists a regular language R such that P = Par(i?). The set S is 
Parikh if Par(S') is semilinear. Note that since Par(i?) = iff R = 0, emptiness of 
semilinear sets is decidable. 
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A set of graphs is NR if it is generated by a context-free node replacement graph 
grammar, see, e.g., [Eng97,Cou94]; it is also called C-edNCE or VR. 

Lemma 2. (Theorem 7.1 of [Cou94]) The images of NR sets of graphs under MS 
graph-to-dgraph transductions are Parikh. 

In fact, the class of NR sets of graphs is closed under MSO graph transductions (see 
Theorem 4.2(3) of [Cou94], or Section 5 of [Eng97]) and NR sets of graphs are Parikh 
(see Proposition 4. 1 1 of [Eng97]). 

A useful property of semilinear sets is their (effective) closure under intersection. It 
implies the following lemma. 

Lemma 3. It is decidable for a semilinear set SCtf 2 whether there exists an n G N 
such that (n, n) G S. 

Proof. Let P = {(n, n) \ n G N} = Par((a6)*). The lemma holds because S n P is 
semilinear [GS64] and semilinear sets have a decidable emptiness problem. □ 

We identify the string w = a\a 2 ■ • ■ a n with the graph that has ^-labeled nodes 
v n+ \ and, for 1 < i < n, an a^-labeled edge from Vi to v i+ i. For 1 < i < n, 
we denote by w/i the i-th letter ai of w. 

Lemma 4. Let A be an alphabet and a G A. There exists an MSO string-to-dgraph 
transducer such that for every w G A*, 

NaM = {dgr(a") \w/n = a}. 

Proof. The transducer N A uses one parameter Y\ to nondeterministically choose a node 
v that has an outgoing a-labeled edge. It copies v and all input nodes to the left of v, 
and labels them a. There are no edge formulas because dgraphs have no edges. Define 

N A = ({l},^ dom (Fi),Vi,a(a;,n),0)with 

¥>dom(*i) = singleton(yi) A (3x){3y)(edg a (x, y) A x G Yi) 
^ a (x,Y 1 ) = (3y)(x^yAyeY 1 ) 

where singleton(Fi ) expresses that Y\ is a singleton, and x -< y that there is a path from 
x to y. □ 

We denote the disjoint union of graphs hi and h 2 by hi ttJ h 2 . 

Lemma 5. Let Mi, M 2 be MSO graph transducers. There exists an MSO graph trans- 
ducer M, denoted Mi 1+1 M 2 , such that for every graph g, 

M(g) = {hi iSh 2 \h 1 e Mi(g), h 2 G M 2 (g)}. 

Proof. Let Mi = (Ci,ipi,&i,Xi) and M 2 = (C 2 , ip 2 ,W 2 , X 2 ). We may assume 
w.l.o.g. that Ci is disjoint from C 2 and that the parameters of Mi are disjoint from 
those of M 2 . Then M = (Ci U C 2 , <fii A <p 2 , &i U X 1 U X 2 U X) realizes the desired 
transduction, where all edge formulas in X are set to false. □ 
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Lemma 6. Let Mi, M 2 be MSO graph-to-string transducers and let a,b be distinct 
symbols. There exists an MSO graph-to-dgraph transducer M a - b such that for every 
graph g, 

M a - b {g) = {dgr(a m b n ) | 3h 1 G Mi{g), h 2 G M 2 (g) : h 1 /m = a and h 2 /n = b}. 

Proof. Let M, be from (X^Ji) to ({#}, A) for i G {1,2}. If a A or b A 2 
then let M a - b = (0, false, 0, 0). Otherwise define M a - b = (Mi; iV^J th) (M 2 ; A^J 
according to Lemmas 1, 4, and 5. □ 

Let E be a ranked alphabet, i.e., an alphabet E together with a mapping rank^ : 
E —> N. Let to be the maximal rank of symbols in E. A tree {over E) is an acyclic, 
connected graph in GR(I7, {1, . . . , to}), with exactly one node that has no incoming 
edges (the root), and, for a G E, every a-labeled node has exactly ranks (<r) outgoing 
edges, labeled 1,2,..., rank^(cr), respectively. 

For a relation R C A x B and a set D C A, denote by i?|^> the restriction of R to 
D, i.e., R\ D = {(a, b) e R \ a e D}. 

Theorem 7. It is decidable for deterministic MSO graph-to-string or graph-to-tree 
transducers Mi, M 2 and an NR set D of graphs whether tm x \ d = t~m 2 \ d- 

Proof. We start with the graph-to-string case. For i G {1, 2} let Di — dom(Mi) n D. 
We first show that it is decidable whether Di = D 2 . Clearly, D\ = D 2 if and only if 
Yai(E(D)) = 0, where E is the deterministic MSO graph-to-dgraph transducer that re- 
moves the edges of all graphs in the symmetric difference of dom(Mi) and dom(M 2 ): 
E = ({l},-i((fi ip 2 ), {ipi.cr(x)}<T£Z, 0} where ipi is the domain formula of Mj 
for i G {1, 2}, E is the node alphabet of D, and ipi t<7 (x) — lab a (x) for a G E. By 
Lemma 2, Par(_E(D)) is effectively semilinear, and hence its emptiness can be decided. 
If Di ^ D 2 then we are finished and know that tm x \d ^ t m 2 \d- Assume now that 
Di = D 2 . 

Let Mi have output edge alphabet A, for i G {1,2}, and let $ be a symbol not in 
A = Ai U A 2 . We define deterministic MSO graph-to-string transducers Mf = Mi;N 
such that Mf (g) = Mi(g)% for all g G dom(Mj). Here N is the deterministic MSO 
string transducer (C, true, {ipi,#(x),il>2,#(x)},{Xc,c',6(x,y)} C!C > e c,6eAu{$}) such ^ 
C = {1,2}, tpi,#(x) = true, ^ 2) #(ar) = Xi,2,$0,2/) = -.(3z) V 5e4 edg 5 (x, z) and, 
for (5 G A Xi, 1,5(^1 2/) = e dgs( x , y)\ a H other edge formulas are set to false. 

Since now all output strings end on the special marker $, tm x \d 7^ t m 2 \d iff 

3a3b : (d(a, b) A 3n3g : (g G D 1 A Mf(g)/n = a A M 2 $ ( 3 )/n = 6)) 

where d(a, 6) denotes the statement a, 6 G (A U {$}) A a ^ 6. For given a, 6, let M a,fc 
be the transducer of Lemma 6 for a, b, Mf, M| . Then the statement displayed above 
holds if and only if 

3a3b : (d(a, b) A 3n : dgr(a"&") G M a < b (L>))) 
iff 3a3b : (d(a,b) A 3n : (n,n) G Par(M a ' b (L>)))) 

S v ' 

P(a.b) 
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By Lemma 2, Par(M a ' h (D)) is effectively semilinear. By Lemma 3 this means that 
P(a, b) is decidable. Since there are only finitely many a, b with d(a, b), the statement 
is decidable. 

We now reduce the graph-to-tree case to the graph-to-string case. Let A be a ranked 
alphabet and let m be the maximal rank of its elements. There is a deterministic MSO 
tree-to-string transducer Ma that translates every tree t over A into the string pre(i) of 
its node labels in pre-order. Clearly, if we associate with a deterministic MSO graph-to- 
tree transducer M (from (S, r) to (A, {1, . . . , m})) the deterministic MSO graph-to- 
string transducer M = M; Ma, then M\ is equivalent to M-i on D if and only if Mi is 
equivalent to M 2 on D. Let M A = ({1, 2}, true, {V>i,#, ^2,#}, {Xc,c',«} c ,c'€{i,2},«ezi) 
with = true, i/>2,# = root(x), where root(x) expresses that x is the root node. 
Further, for S € A, Xi,i,a = lab,s(x) A w(x,y) and Xi,2,a = lab^x) A root(y) A 
->(3z) 7r(x, z) where n(x, y) expresses that y is the successor of x in the pre-order. □ 

String and Tree Transductions Clearly, Theorem 7 also holds if we restrict the input 
graphs to strings or trees. In particular, deterministic MSO X-to-Y transducers have 
decidable equivalence for all 1,7 6 {string, tree}. For string transducers this reproves 
the decidability result of [Gur82] (through [EH01]). For trees we obtain the following 
new decidability result. 

Corollary 8. The equivalence problem is decidable for deterministic MSO tree trans- 
ducers. 

Of course, even stronger statements hold; namely, given an NR set D of strings or 
trees, it is decidable if two deterministic MSO X-to-Y transducers are equivalent when 
restricted to D. For string transducers this means the following. 

Corollary 9. It is decidable whether two deterministic two-way finite state transducers 
are equivalent on an NR set of strings. 

As discussed in Section 6 of [Eng97], the NR sets of strings are the same as the 
ranges of deterministic tree-walking tree-to-string transducers. They properly contain, 
for instance, the context-free languages and the ranges of deterministic two-way finite 
state transducers. Since the NR sets of strings form a full AFL of Parikh languages, 
Corollary 9 is in fact a special case of the general decidability result for deterministic 
two-way finite state transducers in Theorem 5 of [Iba82]. It is incomparable to the 
decidability of equivalence of two such transducers on an NPDTOL language [CK87]. 

The two statements of the next corollary follow from the characterizations of deter- 
ministic MSO definable tree transductions in [BEOO] and [EM03b], respectively. Note 
that a tree transducer is of linear size increase if the size of the output tree is at most 
linear in the size of the input tree. 

Corollary 10. The equivalence problem is decidable 

( 1) for single-use restricted attributed tree transducers and 

(2) for deterministic macro tree transducers of linear size increase. 
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This result is incomparable with the decidability of the equivalence problem for 
nonnested separated attributed/macro tree transducers proved in [CF82]. It remains 
open whether the equivalence problem is decidable for attributed tree transducers and 
for deterministic macro tree transducers. 

In [MSV03] the fc-pebble tree transducer was introduced, and claimed to subsume 
(the tree translation core of) all known XML query languages. Hence, we call determin- 
istic pebble tree transducers deterministic XML queries. Such queries can be simulated 
by compositions of macro tree transducers [EM03a]. If such compositions are of linear 
size increase, then they are MSO definable [Man03]. 

Corollary 11. The equivalence problem is decidable for deterministic XML queries of 
linear size increase. 
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